Reading notes: Reasoning with latent structure refinement for document-level relation extraction

2 minute read

Published: May 19, 2021

arxiv link of the source paper

1. Contributions of this paper:

1. Achieve better document-level RE/reseaoning results using a stronger graph which captures more non-local interactions. The model is called LSR(Latent Structure Refinement) model

2. This graph is treated as a latent variable, automatically induced with a refinement strategy which incrementally aggregate relevant information.

3. Example of how the model can do step-by-step “relational reasoning”:

Lutsenko is a former minister of internal affairs.
He occupied this post in the cabinets of Yulia Tymoshenko.
The ministry of internal affairs is the Ukranian police authority. The question is what is the relation between “Yulia Tymoshenko” and “Ukranian”. When reading the 3 sentecnes, the model will know:
1. Lutsenko and “He” are mentions of same entity
2. Lutsenko works with Tymenshenko
3. Lutsenko managed internal affairs, which is a Ukranian authority
4. Thus, Lutsenko is Ukranian -> Tymenshenko has nationality of Ukranian

2. Limitations of current practices?

1. Mostly with-in sentences, ignoring the rich information accross sentences:

This is because most of the current graph (even document-level) model dependencies based on (1)syntactic trees(POS), (2)coreference, or (3)heuristics
Do not have “relational reasoning”

3. About LSR model

1. What are the major subtasks?:

Node constructor
- (1)encode sentences with contextual representation
- (2)construct representations of 3 types of nodes: mention, entity, and MDP(meta dependency paths)
- What is MDP nodes?
- In each sentences, there are multiple dependency path. The shortest parths are MDP path. The nodes in MDP paths are MDP nodes.
Dynamic reasoner
- First, it induce a doc-level structure using extracted nodes
- Second, it keeps updateing node representations based on information propagatoin(iteratively refined)
Classifier
- Given the final node representations after refinement, calculate classification scores

2. Choice of actual model components?

Context enoder: BiLSTM or BERT

3. Steps

Context encoding
- Given a doc d, for sentence d_i, its j-th word has hidden state hⁱ_j = [ $\overleftarrow{h_{j}^{i}}$ ; $\overrightarrow{h_{j}^{i}}$ ], which is the concatenation of hidden states of two directions.
- $\overleftarrow{h_{j}^{i}}$ = LSTM( $\overleftarrow{h_{j+1}^{i}}$ , $\gamma_{j}^{i}$ ), where $\gamma_{j}^{i}$ is the word embedding of the j-th token.
- Similarly, $\overrightarrow{h_{j}^{i}}$ = LSTM( $\overrightarrow{h_{j-1}^{i}}$ , $\gamma_{j}^{i}$ )
- In this way, the mention nodes and MDP nodes will be represented. For entity nodes, it will be constructed by averaging the related coreference mention nodes.
Node extraction
- Use tokens on shortest dependency path between mentions in sentence. It will effectively make use of relevant information while ignoring the irrelevant one.
Structure induction
- Induce latent dependency structure using structured attention and a variant of Kirchhoff’s Matrix-Tree Theorem
- Given the representation of the i-th and j-th nodes to be u_i and u_j, calculate the pair-wise unnormalized attention scores s_ij between them. s_ij = (tanh(W_pu_i))^TW_b(tanh(W_cu_j))
- Compute root score which is the unnormalized probability of the i-th node to be selected as the root node of the structure s^r_i = W_ru_i
- Calculate marginal probability of each dependency edges
Multi-hop reasoning
- Use GNN, more specifically GCN(convolutional).
- A graph G with n nodes can be represented as an n x n adjacency matrix A, which induced by previous structure induction
- A node i at level l will have representation u^l_i, which can be computed using u^l-1_i
- Formula: ![](https://latex.codecogs.com/gif.latex?\inline&space;u_{i}^{l}&space;=&space;\sigma&space;(\sum_{j=1}^n&space;A_{ij}W^lu_j^{l-1}&space;+&space;b^l%29)
Iterative refinement
- Will do refinement of structure N times, by stacking N blocks of the dynamic reasoners.
- The induced structure will be more and more refine as more and more information aggregated
Classifier
- After N times refinements, we will get final representations of all nodes. We will compute the probability of these two entities belong to a certain relation r using a bilinear function.
- Formula: ![](https://latex.codecogs.com/gif.latex?\inline&space;P(r e_i,&space;e_j%29&space;=&space;\sigma&space;(e_i^TW_ee_j&space;+&space;b_e%29_r)